What is Text Analysis, Really?

نویسنده

  • Geoffrey Rockwell
چکیده

In which the author revisits the question of what text analysis could be. He traces the tools from their origin in the concordance. He argues that text analysis tools produce new texts generated from queries through processes implemented on the computer. These new texts come from the decomposition of original texts and recomposition into hybrid new works for interpretation. The author ends the article by presenting a portal model for how text analysis tools can be made available to the community. Introduction to analyze is nothing but an operation that results from the conjunction of the preceding operations. It merely consists in composing and decomposing our ideas to create new combinations and to discover, by this means, their mutual relations and the new ideas they can produce. (Condillac, Essay on the Origin of Human Knowledge, p. 48) In a mock confrontation between Allen Renear and Jerome McGann at the ACH/ALLC in 1999 at the University of Virginia, two views as to what a text really is were put forward. Renear put forward, for the sake of the confrontation, the OHCO (ordered hierarchy of content objects) perspective while McGann practiced a view of text as performance. In the context of a humanities computing conference this confrontation was designed to highlight the relationship between theories of text and ways of representing texts digitally. Renear’s Platonic view of the text as a real abstract object fits nicely with the dominant practice for the digital representation of texts, as represented by the guidelines of the TEI. McGann instead gave us an example of a reading that was both a performance itself and pointed to the Preprint of Rockwell, Geoffrey, “What is Text Analysis, Really?”, Literary and Linguistic Computing, Vol. 18, No. 2, 2003, p. 209-219. 1 combinatorial possibilities within and around the text. McGann’s challenge to Renear was to show how a playful reading of a text was both a new text and that this potential could not be captured easily by an OCHO. The confrontation succinctly opened again the question of the relationship between how we represent texts, how we use them, and our theories of textuality. What does this have to do with computer-assisted text analysis? What was not made clear in the confrontation was the role of the tools we use for accessing and manipulating digital texts; tools which I will call text analysis tools. If we are to take McGann’s public performance of a reading as an analogue for what we wish to achieve with these tools, we have to think not only about how we represent the text but also about the performance of analysis and the tools that are used to perform this analysis with a computer. The logic of the tools, despite (or because of) their tendency to become transparent in use, can enhance or constrain different types of reading which in turn makes them a better or worse fit for practices of literary criticism including the performance of criticism. Another way of saying this is that we have a model of computer-assisted literary text analysis that is guided by a view of what a text is and how we should use it that does not match the practice of many contemporary literary critics. (It should be noted that this is not true in the field of computational linguistics and may not be true in literary criticism in the future.) Consequently, as others have pointed out, text analysis tools and the practices of literary computer analysis have not had the anticipated impact on the research community. This is often blamed on the absence of easy-to-use tools, especially tools that take advantage of OCHO, but there are two other issues that have to be taken into account. Preprint of Rockwell, Geoffrey, “What is Text Analysis, Really?”, Literary and Linguistic Computing, Vol. 18, No. 2, 2003, p. 209-219. 1 First, the tools we have (and even those we anticipate) have emerged out of a particular tradition that I will call an “editorial” tradition that goes back to tools for editors of concordances starting with Roberto Busa. Second, I believe that the moment when humanities computing could have an impact on literary criticism through the provision of critical tools (accompanied by relevant methodologies and theories that backstop the tools) is passing as industry server based text tools emerge instead. These industry tools provide access to licensed digital archives and satisfy our colleagues while we keep on imagining personal research text analysis tools. The community we hoped to provide with text analysis research tools has found them elsewhere while we fiddle. Text Tools and Concording To understand the current state of text analysis tools and their logic we can briefly review their history in terms of the practices they complement and the theories of textual practice they augment. Text analysis tools have their roots in the print concordance. The concordance, is a standard research tool in the humanities that goes back to the 13th century. Concordances are examples of the sorts of “augmentation” tools that extend our scholarly reach and therefore assist in intellectual work of the sort that Vannevar Bush and Douglas Engelbart imagined. The first computer-based text-analysis tools were designed to assist in the production of print concordances. Father Roberto Busa in the late 1940s was one of the first to use of information technology in the production of a concordance, his Index Thomisticus, (a remarkable concordance and more to the works of Thomas Acquinas). His project began by using index cards, moved onto analogue information Preprint of Rockwell, Geoffrey, “What is Text Analysis, Really?”, Literary and Linguistic Computing, Vol. 18, No. 2, 2003, p. 209-219. 1 technology in the 50s and migrated to electronic computers as they became available. The published results were finally delivered in the 1970s with a CD released in 1992. The technology he used was developed ad hoc as he went along rethinking how information technology could facilitate his project. In the 1960s and 70s the first generation of tools created to be used by others became available. These were tools for mainframes that were batch tools, and they were designed, like Busa's tools, to assist in the production of paper concordances. The paper concordances would still be the tool that the rest of us used, the computing tools were for the editors of these concordances. It is interesting to review the names of some of these early tools. COCOA stands for Count and Concordance generation on the Atlas. The Oxford University Computing Service took over COCOA in 1978 and produced OCP or the Oxford Concordance Program. With the availability and increasing power of microcomputers in the 1980s, text analysis tools migrated from mainframes to personal computers. OCP evolved into Micro-OCP and new programs came out for the personal computer like the Brigham Young Concordance program (BYC) later renamed and commercialized under the name WordCruncher and the TACT environment developed at the University of Toronto and released in 1989 at the ACH/ALLC conference that year. When these tools became available to researchers on their personal workstation they changed how we use tools in three ways. The scholar could now use tools whenever and wherever they wanted on a personal computer instead of having to wait for mainframe time or having to connect over a tethered terminal. In effect this meant that the humanist was no longer dependent on the paper concordance when doing research in their study, but could use electronic tools instead of print. This change in the time and place of computerPreprint of Rockwell, Geoffrey, “What is Text Analysis, Really?”, Literary and Linguistic Computing, Vol. 18, No. 2, 2003, p. 209-219. 1 assisted text-analysis, along with interface developments, led developers away from a batch concording model towards interactive tools that took advantage of the fact that the scholar would have personal access to tools and e-texts for study in their time and place of study. Secondly, with interactive tools and a more mature community of users we began to realize we could ask new types of questions that print concordances could not support. As we experimented with new questions we realized that one of the things that was important was this intellectual process of iteratively trying questions and adapting tools to help us ask new questions. We can do so much more now than find words in a string. We can ask about surrounding words, search for complex patterns, count things, compare vocabulary between characters, visualize texts and so on. Thirdly, as personal tools became available, we began to re-imagine the electronic text, which went from being something created by (and exclusively for) a concordance project to an electronic edition meant to be used by anyone with whatever tools they might have for unanticipated future research. Our models for tools and e-texts began leapfrogging each other as advances in tools triggered the need for improvements in texts. Now advances in text models and markup have surpassed the personal tools. The Hermeneutics of Text Analysis Let us pause now to consider the hermeneutical principles behind the concordance and tools that extend it. As Willard McCarty puts it, “The early history of the concordance suggests that it was invented essentially for the same job to which we apply it today, 750 years later: to discover patterns of coherence in a text or textual Preprint of Rockwell, Geoffrey, “What is Text Analysis, Really?”, Literary and Linguistic Computing, Vol. 18, No. 2, 2003, p. 209-219. 1 corpus. ... the concordance very likely grew out of a habit of mind conditioned by a typological or figural view of the Bible, i.e. the intratextual notion that the meaning of the biblical text is derived by putting together normally disjunct passages into a concordatia, a concord of senses.” The Encyclopedia Britannica Online warns in its discussion of “Parallelism” as a form of Scriptural interpretation, against the naive use of concordances. Parallelism, the interpretation of Scripture by means of Scripture, is a corollary of the belief in the unity of Scripture. But as a hermeneutical principle it must be employed sparingly, since the unity of Scripture should be based on comprehensive exegetical study, rather than itself provide a basis. ... One naive form of parallelism is the ‘concordant’ method, in which it is axiomatic that a Hebrew or Greek word will always (or nearly always) have the same force wherever it occurs in the Bible, no matter who uses it.” The hermeneutical principles underlying the use of the concordance and the textanalysis tools that evolved from it can be summarized thus: • First, the use of a concordance for interpreting a text presumes that there is some sort of unity to the text and a consistent use of words. • Second the concordance is a new text that is assembled out of passages that agree or concord. The concorded hybrid provides a new combination of the parts of the original work. The concordance is a monster new text patched out of the old. • Third, a concordance is generated according to some procedure, be it a manual procedure or process implemented in software. The procedure that generates the concordance takes as its input a query about a word or pattern. The particular concordance one looks up or generates is a text in response to a choice by the reader that is generated by the software or editor according to established

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

What Really Matters: Living Longer or Living Healthier; Comment on “Shanghai Rising: Health Improvements as Measured by Avoidable Mortality Since 2000”

The decline in Avoidable Mortality (AM) and increase in life expectancy in Shanghai is impressive. Gusmano and colleagues suggested that Shanghai’s improved health system has contributed significantly to this decline in AM. However, when compared to other global cities, Shanghai’s life expectancy at birth is improving as London and New York City, but has yet to surpass that of Hong Kong, Tokyo,...

متن کامل

TOTAL KNEE REPLACEMENT IN VARUS KNEE: WHAT MEASUREMENT REALLY MATTERS? INTRODUCING A NEW CLASSIFICATION SYSTEM

This was Presented in 5th International Congress of Iranian Iranian Society of Knee Surgery, Arthroscopy, and Sports Traumatology (ISKAST), 14-17 Feb 2018- Kish, Iran

متن کامل

Determinants of Subjective Well-Being; Do We Really Know What Makes People Happy? : A Study Among Rasht Dwellers as a Metropolis in North of Iran

Recently, along with traditional economic indicators, policymakers are increasingly dealing with subjective well-being (SWB) as an evaluation criterion of their performance and as an index for the population’s psychology health. This study tries to define different determinants of SWB with a focus on some specific aspects of the living area. Also, this article investigates outskirt-urban differ...

متن کامل

Getting to the Heart of the Matter; Speech is More than Just the Expression of Text or Language

This talk addresses the current needs for so-called emotion in speech, but points out that the issue is better described as the expression of relationships and attitudes rather than the currently held raw (or big-six) emotional states. From an analysis of more than three years of daily conversational speech, we find the direct expression of emotion to be extremely rare, and contend that when sp...

متن کامل

Refining our Notion of What Text Really Is: The Problem of Overlapping Hierarchies

Introduction OHCO-1 Thesis: Text is an Ordered Hierarchy of Content Objects Arguments Pragmatic Empirical Theoretical Counterexamples: Multiple Logical Hierarchies In the Old (SGML) View Genres Determine Text Objects The New (TEI) View: Perspectives Determine Text Objects Consequences of the Shift: There is no Unique Logical Hierarchy OHCO-2 Thesis: Perspectives Determine OHCOs Counterexamples:...

متن کامل

Semiotic Analysis of Written Signs in the Road Sign Systems of Tehran City

Introduction: as a component of the urban landscape, road sign systems are among the most critical elements of urban environments. Generally speaking, the written signs dominate the design of these systems. These signs can also foster aesthetic and visual pleasure compellingly and innovatively. Furthermore, they perpetuate a specific image in the minds of their observers. This research seeks to...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • LLC

دوره 18  شماره 

صفحات  -

تاریخ انتشار 2003